智能论文笔记

Clustering with Tangles: Algorithmic Framework and Theoretical Guarantees

Solveig Klepper , Christian Elbracht , Diego Fioravanti , Jakob Kneip , Luca Rendsburg , Maximilian Teegen , Ulrike von Luxburg

分类：机器学习 | (统计)机器学习

2020-06-25

Originally, tangles were invented as an abstract tool in mathematical graph theory to prove the famous graph minor theorem. In this paper, we showcase the practical potential of tangles in machine learning applications. Given a collection of cuts of any dataset, tangles aggregate these cuts to point in the direction of a dense structure. As a result, a cluster is softly characterized by a set of consistent pointers. This highly flexible approach can solve clustering problems in various setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. The output of our proposed framework is hierarchical and induces the notion of a soft dendrogram, which can help explore the cluster structure of a dataset. The computational complexity of aggregating the cuts is linear in the number of data points. Thus the bottleneck of the tangle approach is to generate the cuts, for which simple and fast algorithms form a sufficient basis. In our paper we construct the algorithmic framework for clustering with tangles, prove theoretical guarantees in various settings, and provide extensive simulations and use cases. Python code is available on github.

translated by 谷歌翻译

本文记录了伊图哥本哈根（ITU Copenhagen）生产的法罗伊斯（Faroese）和丹麦（Faroese）之间的句子对数据集。数据涵盖了两种源语言的tranlsation，旨在用作此语言对的机器翻译系统的培训数据。

translated by 谷歌翻译

在这项研究中，我们展示了如何扩展使用生成对抗网络（GAN）作为经济情景发电机（ESG）的现有方法到整个内部市场风险模型 - 具有足够的风险因素，可以为保险的全部投资范围建模。公司和偿付能力2中所需的一年时间范围内的一年时间范围。我们证明了基于GAN的内部模型的结果与欧洲的监管批准的内部模型相似。因此，基于GAN的模型可以看作是数据驱动的替代市场风险建模方式。

translated by 谷歌翻译